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Time to take the red pill 


Listening to presentations, and talking to delegates, at Internet Librarian International 
2005 (ILI) last week, | was reminded of the film The Matrix. In the movie, the main 
character is offered an opportunity and a choice: he can take the red pill and see the 
truth; or he can take the blue pill and return, comfortably unaware, to the illusion that 
is the world of the Matrix, and life will simply carry on as before. 


With the Internet continuing to challenge their traditional skills and roles, information 
professionals face a not dissimilar choice: embrace the reality of the new world they 
inhabit, or seek to deny it, clinging to a now outdated illusion of reality. 


Disconcerting 


For while information professionals initially welcomed the arrival of the Internet, many 
have become increasingly concerned that it poses a significant threat to their settled 
world. 


This concern was all too evident at ILI, with both delegates and presenters clearly of 
the view that many traditional notions of information science are under attack from 
the Web. Long-standing classification systems, for instance, are threatened by newer 
notions of categorisation; hierarchical indexing is having to give way to the flat 
indexing of the Web; and taxonomies face growing pressure from new-fangled 
concepts like folksonomies. 


For information professionals — who pride themselves on the many skills and 
techniques that they have developed over the years — this is both disorientating and 
distressful. If that were not enough, the Web challenges the very notion that 
information intermediaries have a role to play any more in a networked world. 


None of these anxieties are new, of course, but the depth and intensity of the pain 
information professionals are experiencing was all too palpable at the London event. 
Certainly there was a desperate need to appear relevant. As one librarian plaintively 
put it, "We need to find ways to put ourselves back between the information and the 
user." 


That said, some information professionals — generally the younger ones — are 
embracing the new world. Michael Stephens, a special projects librarian at St. 
Joseph County Public Library in Indiana, for instance, gave a presentation in which 
he talked with great enthusiasm about how libraries can exploit wikis, instant 
messaging, and podcasts to enhance the services they provide for patrons. 


Stephens also bravely volunteered to defend folksonomies from the caustic tongue of 
UKOLN's Brian Kelly who, amongst other things, publicly critiqued Stephen's 
"inadequate" use of tags when labelling photographs of his dog Jake on the social 
networking site Flickr. Kelly's aim was to demonstrate that folksonomies are a pale 
shadow of traditional classification, even in the hands of a trained librarian. 


Grumpy old men 


All in all, it felt at times as if IL! was awash with grumpy old men muttering bad- 
temperedly about the good old days, and the shocking ignorance of the young. 


This attitude was best exemplified in the keynote given by information industry 
personality Stephen Arnold. In a paper entitled Relevance and the Future of Search, 
Arnold complained that the traditional view of relevance in online searching was 
under siege on the Web. 


Specifically, information science's notion of precision and recall (where precision 
measures how well retrieved documents meet the needs of the user, and recall 
measures how many of the relevant documents were actually retrieved) was being 
destroyed by the practises of web search engines, particularly Google. 


This state of affairs, he argued, is being driven by the desire to monetise the Web, 
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not least through Google's pioneering of advertising-driven search models. When a 
user does a search on Google, for instance, the resulting pages of "organic results" 
(i.e. the product of Google's search algorithm) are placed alongside links paid for by 
advertisers. Unfortunately, said Arnold, over 90% of users do not differentiate 
between the paid listings and organic results. 


Entirely alien 


The situation is aggravated, he added, because people don’t generally click through 
many pages of search results. This encourages owners of web sites to exploit 
Google's search algorithms in order to push links to their sites higher up Google's 
search page. Indeed, said Arnold, a large and powerful Search Engine Optimisation 
(SEO) industry has been created precisely in order to sell services aimed at "fixing" 
search results on Google and the other main search engines. As a consequence, he 
complained, relevance on the Web is now a concept entirely alien to anything 
understood by information professionals. 


As the market leader, and primary innovator, it was Google that attracted the full 
force of Arnold's ire. “Indexing is not what you learned in library school," he said. "It’s 
what Google wants. Effectively, SEO is the new indexing model." 


In other words, the notions of comprehensiveness and objectivity long promulgated 
by information professionals as central to online searching have given way to a 
process whose raison d’étre is to falsify search outcomes to satisfy commercial 
interests. "The SEO market has grown up to take advantage of this new idea of 
relevance," said Arnold. 


To underline the extent to which traditional notions of relevance have been 
undermined, Arnold cited research done by the UK-based Internet magazine .net, 
which found only a 3% overlap in search results listed on Google, Yahoo and 
AskJeeves when the same search term was input. "When is a hit relevant?" Arnold 
asked rhetorically. "Where is the boundary between SEO and ‘real indexing?” 


Worse, added Arnold, Google's dominance is growing all the time. Whereas in the 
previous quarter it had had a 51% share of weblog referrals in the US, for instance, 
this figure is now 62%. (blog referral logs collect information on who visits a website 
and how they arrived there). 


Intellectual dishonesty 


After his presentation | asked Arnold why he objected to these developments. "It's 
intellectually dishonest," he replied. "These shortcuts trivialise indexing." Moreover, 
he added, it is dangerous. "If a medical term is misused, it could affect a person's life 
if the appropriate article is not found. Likewise, if a company doesn’t find the right 
patent document it could cost that company a lot of money. So | really disapprove.” 


But is it really likely that a corporate lawyer or a doctor would rely on Google for an 
exhaustive patent or medical search? And are information consumers really as naive 
or stupid as Arnold implies? 


As Arnold himself acknowledged, most users probably don’t care if their search 
results are paid-for ad links, or the product of Google's algorithm. If someone is 
looking for a restaurant, for instance, what they want to find is a good-enough 
restaurant, not a long list of every possible eating house available, categorised by 
thirty different criteria, and listed by the number of available tables! After all, most of 
the sponsored links turn up on pages where users are looking for products or 
services. In this case Google is simply acting like a yellow pages directory. 


Moreover, even if it is true that web users don’t always understand the way search 
engines work, they are learning all the time. In fact, as a general rule, users know as 
much as they need to know, and this is usually more than information professionals 
give them credit for knowing! 


All in all, it was hard not to conclude that Arnold reflects the grumpy old man school 
of information science. As he himself admitted. "I'm old. I'm dying out." 


For all that, while deprecating SEO techniques, Arnold was happy enough to offer 
the audience five "cheats" they could use in order to ensure their web sites received 
higher rankings on Google. 


He also included in his presentation what amounted to a sponsored link. After 
explaining his five cheats, he told the audience they could find another five in his 
eBook on Google (The Google Legacy, How Google's Internet Search is 
Transforming Application Software), and invited them to buy it ($180 to you Madam!) 


Essentially, Arnold's view seemed to be that much is awry on the Web, but there is 
little to be done but accept it. 
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They're watching us! 


But Arnold had a second point to make. While many still view Google as a search 
company, he argued, it was now far more than that. Currently offering 56 different 
services, he explained, Google is in the process of creating a completely new 
operating system — one moreover up to 40 times faster than anything that IBM or 
HP could offer, and based on anything between 155,000 to 165,000 servers. 


This too Arnold clearly deprecated, explaining that this "Googleplex" (a term he has 
appropriated from the name of Google's Mountain View headquarters) now encircles 
the world like the carapace of a tortoise — making Google the new AT&T; an AT&T, 
moreover, not subject to any regulation. Clearly in likening the Googleplex to a new 
operating system Arnold was also portraying Google as the new Microsoft. 


At this stage Arnold's presentation began to sound more like a conspiracy theory 
than factual exposition. Confiding to the audience that Google founders Larry Page 
and Sergey Brin had refused to speak to him once they realised his was a critical 
rather than adulatory voice, and referring to a series of patent thickets that Google 
has built around its technology (patents which his lawyer had, for some inexplicable 
reason, advised Arnold not to put up on the Web), he went on to complain that he 
had never provided his address to Google, yet the company nevertheless knew it. 
"Google knows where | live," he said dramatically. "| didn’t tell them. They are 
watching me!" 


And for those librarians still harbouring any illusion that by scanning books and 
making them available on the Web Google represents a force for good, Arnold 
depicted Google Print as a smokescreen. “The scanning of books is a red herring,” 
he said, adding that Google was like a magician into whose hand a quarter suddenly 
appears as if from nowhere. "Everyone looks at the quarter, not the magician.” 


Fortunately, Arnold's presentational mode appeared to owe more to his predilection 
for drama — and a canny sense of how to market a new book — than to paranoia. It 
also had moments of humour. Fifteen minutes into his presentation, we were all 
evacuated after the hotel fire alarm was set off, giving Arnold the opportunity to yell: 
"You see — I'm so hot! This is what | use in bars to get women." 


Later, when we were allowed to re-enter the hotel to hear the rest of Arnold's 
presentation, the conference organiser announced that the alarm had been triggered 
by an old man smoking a cigar in his bed. "And that old man," promptly quipped 
Arnold, "is none other Gregorovich Brin, Sergey's uncle." 


Not only is Google watching Arnold, it seems, but its founders have deployed their 
extended family to silence him! 


Real or perceived threat? 


But how seriously should we take Arnold's prognostications? He is, after all, not the 
only commentator to depict Google as the new Microsoft, or AT&T, and thus a 
significant monopoly threat. 


Interestingly, most now view Microsoft as somewhat grey at the temples. This more 
relaxed view, moreover, is a consequence not of the antitrust case against the 
company — after all, Judge Jackson's order to break up Microsoft was subsequently 
overturned by a federal court — but from the growth of new competitors like Google, 
and the rise of the open source software movement. 


That said, Arnold is right to deprecate the growing commercialisation of the Internet, 
and now that Google is a public company we can surely expect its "do no evil" ethos 
to come under increasing pressure from shareholders keen to see the return on their 
investment maximised. 


But leaving aside Arnold's dire predictions of an all-seeing, all powerful Googleplex 
encircling the world and pulling everyone into its monopolistic grasp, it is certainly 
worth asking how much of a monopoly threat Google represents to web searching. 
The answer seems to be: "Not as much of a threat as Arnold implies". Many, for 
instance, believe that large generic search engines are set to see their dominance 
diminish rather than increase. 


Commenting in an EcommerceTimes article earlier this year, the associate editor of 
SearchEngineWatch.com Chris Sherman argued that the bigger the Web grows, the 
less useful generic search engines become. As a consequence, he said, "We're 
seeing a real rise in vertical search engines, which are subject-specific or task- 
specific — shopping, travel and so on." He added: "We're going to see more of that 
going forward as people become more sophisticated and as these specialised search 
engines become better at what they do." 
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Neither is Sherman a lone voice. Commenting in the same article Gartner Group's 
Rita Knox said: "People still need information on the Internet, but a more generic 
search capability like Google is going to be less useful." 


Self-fulfilling prophecy 


Time will tell. But the fundamental problem with Arnold's dark view of the future is 
that conspiracy theories tend to have a debilitating effect on our ability to act. We 
become less inclined to ward off the object of our fear if we believe it to be inevitable, 
creating a kind of self-fulfilling prophecy. 


Arnold is not the only one to be disenchanted with the growing commercialisation of 
the Web. Nor is he the only one to deplore Google's role in this. In a recent paper 
called The Commercial Search Engine Industry and Alternatives to the Oligopoly, for 
instance, Bettina Fabos, from the Media Research Center at Budapest University of 
Technology and Economics, makes very similar points. Her conclusion, however, is 
very different. 


Rather than portraying the situation as inevitable, and advising us to get over it, she 
concludes: "[T]o realize the web’s educational and non-commercial potential, 
educators and librarians need to move away from promoting individual skills 
(advanced searching techniques, web page evaluation skills) as a way to cope with 
excessive commercialism" and instead "address the increasing difficulties to locate 
content that is not commercial, and the misleading motives of the commercial, 
publicly-traded internet navigation tools, and the constant efforts among for-profit 
enterprise to bend the internet toward their ends." 


In other words, rather than rushing around like Private Frazer in the BBC Sitcom 
Dad's Army shouting "We're all doomed", information professionals should adopt a 
more positive approach. Why not take the initiative and turn the technology in a more 
desirable direction? Why not fill the web with non-commercial content, and then build 
non-commercial tools to help users locate that content? 


Indeed, says Fabos, some are already at work doing just this. She commends, for 
instance, the activities of initiatives like the Internet Scout Project, which enables 
organisations to share knowledge and resources via the Web by putting their 
collections online; she commends Merlot, the free and open resource providing links 
to online learning materials; and she commends tools like iVia, and Data Fountains, 
designed to allow web users discover and describe Internet resources about a 
particular topic. 


Open Access 


As it turns out, one of the more organised and advanced initiatives with the potential 
to help create a non-commercial web is the open access (OA) movement — a 
movement, in fact, in which librarians have always played a very active role. 


For while the movement's original impetus was solely to liberate scholarly peer- 
reviewed articles from behind the subscription firewalls imposed by commercial 
publishers, there are grounds for suggesting it could develop into something grander, 
in both scope and scale. How come? 


As scholarly publishers have consistently and obdurately refused to cooperate with 
the OA movement in its attempts to make scientific papers freely available on the 
Web, the emphasis of the movement has over time shifted from trying to persuade 
publishers to remove the toll barriers, to encouraging researchers to do it themselves 
by self-archiving their published papers, either in institutional repositories (IRs), or in 
subject-specific archives like the arXiv preprints repository and PubMed Central, the 
US National Institutes of Health free digital archive of biomedical and life sciences 
papers. 


And to assist researches do this, the OA movement has created an impressive 
collection of self-archiving tools, including archival software like Southampton 
University's Eprints, and MIT's DSpace; a standardised protocol to enable 
repositories interoperate (the Open Archives Initiative Protocol for Metadata 
Harvesting , or OAI-PMH); and OAI-compliant search engines like Michigan 
University's OAlster, which harvest records from multiple OAl-compliant archives to 
create a single virtual archive. In this way hundreds of different repositories can be 
cross-searched using a single search interface — much like Google searches the 
Web. Essentially a vertical search engine, OAlster currently aggregates records from 
over 500 institutions. 


But while the initial purpose of the Open Archives Initiative (OAI) was limited to 
scholarly papers, it has become apparent that its aims and its technology could have 
wider potential. As the OAI FAQ puts it, OA advocates came to realise that "the 
concepts in the OAI interoperability framework — exposing multiple forms of 
metadata through a harvesting protocol — had applications beyond the E-Print 
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community." For this reason, the FAQ adds "the OAI has adopted a mission 
statement with broader application: opening up access to a range of digital 
materials." 


How might this work? Two years ago Clifford Lynch published a paper in which he 
argued that there is no reason why an institutional repository could not contain "the 
intellectual works of faculty and students — both research and teaching materials — 
along with documentation of the activities of the institution". It could also contain, he 
said: "experimental and observational data captured by members of the institution 
that support their scholarly activities." 


Indeed, Lynch added, repositories in higher educational establishments could also 
link with other organisations in order to extend and broaden what they offer. " 
[U]niversity institutional repositories have some very interesting and unexplored 
extensions to what we might think of as community or public repositories; this may in 
fact be another case of a concept developed within higher education moving more 
broadly into our society. Public libraries might join forces with local government, local 
historical societies, local museums and archives, and members of their local 
communities to establish community repositories. Public broadcasting might also 
have a role here." 


Need not end there 


And it need not end there. Why not use the OAI technology as the framework for an 
alternative non-commercial web; one encompassing as much as is deemed 
sufficiently valuable that it could benefit from being accessible outside the confines, 
constraints and biases of the commercial web. If users wanted to find a restaurant 
they could go to Google; but if they want to do a medical search then the non- 
commercial web would be a better choice. Data searchable within this alternative 
web would no doubt need to meet certain standards — in terms, for instance, of 
provenance, and depth and range of metadata etc. 


Self-archiving purists discourage such talk, fearful that it may distract the movement 
from the priority of "freeing the refereed literature". But the reality is that as research 
funders like the Wellcome Trust and Research Councils UK begin to mandate 
researchers to self-archive their research papers, so the number of institutional 
repositories is growing. And once a university or research organisation has an 
institutional repository there is an inescapable logic for that repository to develop in 
the kind of directions proposed by Lynch. 


It may be, of course, that in the end OAI technology is not appropriate for this job. It 
may also be wise not to distract the OA movement from its primary aim. But it is 
perhaps now only a matter of time before some such phenomenon develops. 
Initiatives like Google Print and Google Scholar have served to highlight growing 
concerns at the way commercial organisations are now calling all the shots in the 
development of the Web. And it is these concerns that are encouraging more and 
people to think in terms of non-commercial alternatives. 


What we are beginning to see, says Fabos, is a "small but growing countervailing 
force to the commercialisation of 'the universe of knowledge.” What will drive these 
efforts, she adds "is the understanding that, in our commercial system, educators, 
librarians and citizens interested in nurturing a public sphere must work together to 
control the destiny of the internet — or somebody else will." 


Clearly there is a valuable potential role here for information professionals, should 
they choose to seize the opportunity. After all, what better way for disenchanted 
librarians to make themselves indispensable in a new and relevant way — not by 
playing their traditional role as gateways to information (putting themselves between 
the information and the user), but as facilitators able to help researchers and other 
data creators collaborate and share information. If this means abandoning some of 
their traditional skills for new ones then so be it. Now there's a topic for discussion at 
Internet Librarian International 2006! 


The fact is, it's time for information professionals to stop bemoaning the loss of some 
perceived golden age, and take control of the Web. In short, it's time to reach for the 


red pill! 
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